Towards a Hybrid Abstract Generation System 3 the Need for a Hybrid System 3.1 Previous Work on Connectionist Nlp Symbolic Ann-based Content Selector Encoder List of Important Sentences Morphological Analyser Syntactic Analyser Lexicon Semantic Analyser Pragmatic Analyser

نویسنده

  • Maria ARETOULAKI
چکیده

Abstract Generator Artificial Neural Network Figure 1: Hybrid System Architecture and semantic implications. Main verb tense and aspect are examples of syntactic attributes with morphological realisations and semantic/pragmatic consequences. Agent quanti cation and cardinality are semantic features with lexical and syntactic realisation and pragmatic implications. Elaboration, Repetition and Contrast are some of the pragmatic features encoding `Speech Act'-like functions of the clause processed, which also have lexical, semantic, syntactic and morphological aspects. It becomes apparent that the majority of the 80 attributes cross boundaries in the traditional crude classi cation of language levels and belong to more than a single category. So, in computing the value for these features both lexical and structural and contextual information is used. The ANN does not accept speci c values for the features; it encodes whether the feature is applicable or not, or whether it is speci ed in the clause or earlier in the text. In the rst 10 series of experiments, the data employed consisted of whole sentences as they appear in real-world texts. However, in the future, individual clauses will be encoded separately. In this way, only the important ones, whether main or secondary, will be processed by the generator in the construction of the 7 abstract. Accordingly, there are two features in the candidate set recording the presence of a secondary and a main clause, respectively, attached to the one currently analysed. Thus, multiple embedding and conjunction of such clauses can be accommodated for. The input features have been selected in a non-systematic fashion. Most of them have been individually exalted by various pragmatic theories and system developers. Their relative importance, or at least in uence, in language understanding is more or less evident. The number and the predication of a noun, for example, determines the way later pronouns are disambiguated depending on whether they are of the same number and compatible predication, or not. Similarly, the presence of a conditional means that whatever proposition is accompanied by it only holds if the conditional itself is true. Still, the exact role each feature plays is not at all clear. This is why it has been attempted to run the ANN with various combinations of these 80 features so as to empirically evaluate the relative importance of each. Those features which degrade the network performance will be discarded or maybe modi ed, while the others will be corroborated. In a sense, a big set of `solutions' to pragmatic problems was compiled and the ANN has been passed on the task of deciding which are the ttest or more relevant ones for the summarisation task. 5.1 The Experiments All experiments are being carried out on the PlaNet PDP platform [Miyata, 1991]. In the rst instance, only 5 out of the set of 80 features were employed, namely: 1. Focus change (a or 0); this records a shift from topic A of the previous sentence to topic B or to a subtopic of A in the current one [Grosz, 1986]. 2. Explanation (a or 0) and 3. Contrast (a or 0) are both rhetorical types of relations holding between two di erent sentences or two clauses in the same sentence [cf. [Mann and Thompson, 1987]]. 4. Generalisation (a or 0) is a reference to permanent states or habitual, recurring events, rather than individual, unique instances of them. 5. Finally, Title (a or 0) speci es whether or not a clause is a title or subtitle in the text. These features were chosen out of the whole set as the most obviously important, because they relate to the structure of the discourse and its emphasised topics. The associative nature of neural networks renders the speci c order of the feature presentation irrelevant, as long as it remains consistent across all inputs (for all sentences). Initially, the training and test data was based on 100 sentences taken from a variety of newspaper and scienti c articles. These sentences had been encoded as strings of `0's and `a's depending on the corresponding values for the above 5 features. These strings constituted the input activation values for the network. The di erence between the training and the test data is that the former also include a second column of digits for the two manually predetermined output activation values. These represent `importance' (0 or a) and `unimportance' (0 or a), and they have to be `guessed' by the network when not present. Example training sentences, with their input and output activation value vectors, are the following: 8 Accordingly, there are two features in the candidate set recording the presence of a secondary and a main clause, respectively, attached to the one currently analysed. Thus, multiple embedding and conjunction of such clauses can be accommodated for. The input features have been selected in a non-systematic fashion. Most of them have been individually exalted by various pragmatic theories and system developers. Their relative importance, or at least in uence, in language understanding is more or less evident. The number and the predication of a noun, for example, determines the way later pronouns are disambiguated depending on whether they are of the same number and compatible predication, or not. Similarly, the presence of a conditional means that whatever proposition is accompanied by it only holds if the conditional itself is true. Still, the exact role each feature plays is not at all clear. This is why it has been attempted to run the ANN with various combinations of these 80 features so as to empirically evaluate the relative importance of each. Those features which degrade the network performance will be discarded or maybe modi ed, while the others will be corroborated. In a sense, a big set of `solutions' to pragmatic problems was compiled and the ANN has been passed on the task of deciding which are the ttest or more relevant ones for the summarisation task. 5.1 The Experiments All experiments are being carried out on the PlaNet PDP platform [Miyata, 1991]. In the rst instance, only 5 out of the set of 80 features were employed, namely: 1. Focus change (a or 0); this records a shift from topic A of the previous sentence to topic B or to a subtopic of A in the current one [Grosz, 1986]. 2. Explanation (a or 0) and 3. Contrast (a or 0) are both rhetorical types of relations holding between two di erent sentences or two clauses in the same sentence [cf. [Mann and Thompson, 1987]]. 4. Generalisation (a or 0) is a reference to permanent states or habitual, recurring events, rather than individual, unique instances of them. 5. Finally, Title (a or 0) speci es whether or not a clause is a title or subtitle in the text. These features were chosen out of the whole set as the most obviously important, because they relate to the structure of the discourse and its emphasised topics. The associative nature of neural networks renders the speci c order of the feature presentation irrelevant, as long as it remains consistent across all inputs (for all sentences). Initially, the training and test data was based on 100 sentences taken from a variety of newspaper and scienti c articles. These sentences had been encoded as strings of `0's and `a's depending on the corresponding values for the above 5 features. These strings constituted the input activation values for the network. The di erence between the training and the test data is that the former also include a second column of digits for the two manually predetermined output activation values. These represent `importance' (0 or a) and `unimportance' (0 or a), and they have to be `guessed' by the network when not present. Example training sentences, with their input and output activation value vectors, are the following: 8 Our object is to introduce contexts as abstract mathematical entities with properties useful in Arti cial Intelligence. a00a0 a0 [important] On the mountain highway that snakes northeast from Tirana toward the Serbian border, a white-knuckled drive of 120 miles takes more than eight hours over remnants of pavement originally laid by Communist Youth Brigades in 1947. aa0a0 0a [unimportant] The 10-fold cross-validation technique was adopted, whereby the overall data is divided into 10 di erent sets of test data [Weiss and Kulikowski, 1991]. Consequently, ten di erent combinations of 90 training and 10 test sentences were employed in turn for each series of experiments. When the two sentences above were being used as test data, they were represented by just the left-hand side vector, with the input activation, i.e. `a00a0' and `aa0a0', respectively. The network had an overall 65% success rate with these 5 features. In a second series of experiments, a sixth feature was added, Goal specication, and the network performance improved reaching a 69% success rate. Afterwards, four more were added, making up a total of 10 inputs: Time speci cation, Passiveness, Quanti cation and Manner. Now the ANN attained an impressive 86% success rate. This is quite remarkable given that the great majority of the feature patterns were indecisive as regards importance. What this means is that most patterns were appearing just once in the data and, consequently, the importance judgement may not have been representative of the overall behaviour of the speci c pattern. This is where ANNs seem to outperform the corresponding statistical analyses. Nevertheless, absolute success could not be claimed until a larger test set had been used. So, this time 12 features were used for the encoding of 1,100 sentences by three individuals. Moreover, apart from the aforementioned linguistic and extra-linguistic features, a series of `meaningless' features were employed for the encoding of the same sentences. This was done for validation purposes. Example features are: "The second letter of the third word is a vowel", "The fourth word has more than one `S'" or "The rst word ends in a consonant". Instead of di erent sets of 10, 900 1000 sentences were tested this time, after the network had been trained with the same initial 100 sentences, which are distinct from the test set. The corresponding success rates ranged from 55.9% to 59.0%. As regards the 'meaningless' features, the success rate lied between 49.2% and 50.2%, which is not better than chance. The next step was to establish the relationship between the training and the test data size and its e ect on network performance. It was made sure that the training data set consisted of sentences encoded by all 3 individuals. As shown in gure 2, the success rates are generally low for the meaningful features, with the exception of the cases for 250 and 700 training data sizes, which gave the highest rates, 58.4% and 58.2%, respectively. This could mean that 250 sentences are su cient for future training that gives results as good as if 700 sentences were used. In addition, the graph suggests that the drop in performance with training sizes greater than 500 may be due to over-training of the network. At any rate, despite the mediocrity of the results for the meaningful features, the success rates for the meaningless ones are even lower, around 50%, and never exceeding 54.0%. 9 Irrelevant Linguistic Features 60 55 50 45 100 200 300 400 500 600 700 800 % C o rr e c t O u tp u t No. TrainingStimuliCross-Level FeaturesFigure 2: The e ect of the training data size on performance6 Conclusion and Future WorkIn this paper, a hybrid architecture has been described for a system that gener-ates abstracts of unrestricted texts. Symbolic morphological, syntactic, semanticand pragmatic processors have their selective results encoded in binary form inan interface to an Arti cial Neural Network (ANN). The ANN takes the en-coded feature values for each sentence of the text and computes two outputvalues determining the relative degree of importance and unimportance of thesentence currently analysed, in the summarisation of the whole text it belongsto. Finally, a symbolic generator collects the sentences that have been judgedas more important than not and outputs the abstract of the initial text afterapplying a number of syntactic and semantic modi cations.A central claim of the paper is that ANNs are ideal for the simulation andexploitation of the strong interaction between the diverse linguistic and extra-linguistic factors in language comprehension. They have the potential for faringbetter than traditional symbolic techniques in such complex tasks as pragmaticdecision-making. As natural language is de facto creative, ambiguous and im-10 perfect, unpredictability is the rule rather than the exception, which renderssymbolic systems extremely vulnerable. ANNs, on the other hand, can handlenovel or noisy input, because they are able to learn through experience andgeneralise over what was learned. Still, to date ANNs have mainly been usedfor lower-level NLP tasks that symbolic systems already do quite well, and notpragmatics.In the immediate future, further experiments have to be conducted withdi erent combinations of 12 features from the 80 candidate set. Thus, theinterdependence among the diverse features will be established more rigorously,and their relative in uence in rendering a clause important will be clari ed.Moreover, it may be the case that the number of features used so far is still quitesmall. Twelve features may not be enough to establish importance, hence anaverage success rate that is only 8.4% better than random. At least the featuresalready used are not completely irrelevant, as was shown in the comparison oftheir success rates with those of the meaningless features.In addition, sentence encoding needs to be further standardised, so thatdiscrepancies do not occur among the sentences encoded by one or anotherindividual. This seems to have been the case up to now, rendering the datanoisy. The noise is also increased by the fact that diverse text types make upthe corpus; 55 news reports and scienti c articles of varying length and subjectmatter.Finally, di erent topologies and network types have to be experimented with,e.g. the simple recurrent network. In this way, a comparison will be possiblewith the amount of generalisation and the learning rate of the simple back-propagation network. To this end, the established morphological, syntactic andsemantic/pragmatic processors should be combined with the ANN, in orderto produce a powerful hybrid system that outperforms both approaches whenadopted individually.References[Alshawi, 1992] Alshawi, H., editor (1992). The Core Language Engine. ACL-TheMIT Press Series in Natural Language Processing. A Bradford Book, MIT Press,Cambridge, Massachusetts.[Appelt et al., 1993] Appelt, D. E., Hobbs, J. R., Bear, J., Israel, D., and Tyson,M. (1993). fastus: A nite-state processor for information extraction from real-world text. In Proceedings of the 13th International Joint Conference On Arti cialIntelligence (IJCAI-93), pages 1172{1178, Chambery, France. Morgan Kaufmann.[Bennett, 1988] Bennett, T. J. A. (1988).Self-Organizing Systems andTransformational-Generative (TG) Grammar. Cybernetics and Systems: An In-ternational Journal, 19:61{81.[Carling, 1992] Carling, A. (1992). Introducing Neural Networks. Sigma Press, Wilm-slow, Cheshire, U.K.[Chalmers et al., 1992] Chalmers, D. J., French, R. M., and Hofstadter, D. R. (1992).High-level Perception, Representation, and Analogy: A Critique of Arti cial Intelli-gence Methodology. Journal of Experimental and Theoretical Arti cial Intelligence,4:185{211.[Cowie et al., 1993] Cowie, J., Guthrie, L., Jin, W., Wang, R., Wakao, T., Puste-jovsky, J., and Waterman, S. (1993). Crl/brandeis: Description of the diderot sys-tem as used for muc-5. In ARPA. Proceedings of the Fifth Message UnderstandingConference (MUC-5), pages 161{176, Baltimore, Maryland. Morgan Kaufmann.[Diederich, 1990] Diederich, J. (1990). An Explanation Component for a Connec-tionist Inference System. In Aiello, L. C., editor, Proceedings of the 9th European11 Conference on Arti cial Intelligence (ECAI-90), pages 222{227, Stockholm, Swe-den. Pitman Publishing.[Fogelman-Soulie, 1992] Fogelman-Soulie, F. (1992). Neural Networks and Arti cialIntelligence: Competition and Cooperation. In Proceedings of the 10th EuropeanConference on Arti cial Intelligence (ECAI-92): Invited lecture, Vienna, Austria.[Grosz, 1986] Grosz, B. J. (1986). The representation and use of focus in a systemfor understanding dialogs. In Grosz, B. J., Sparck Jones, K., and Webber, B. L.,editors, Readings In Natural Language Processing. Morgan Kaufmann Publishers,California.[Ide and Vernois, 1990] Ide, N. M. and Vernois, J. (1990). Very Large Neural Networksfor Word Sense Disambiguation. In Aiello, L. C., editor, Proceedings of the 9th Eu-ropean Conference on Arti cial Intelligence (ECAI-90), pages 366{369, Stockholm,Sweden. Pitman Publishing.[Lucas et al., 1993] Lucas, N., Nishina, K., Akiba, T., and Suresh, K. G. (1993). Dis-course analysis of scienti c textbooks in japanese: A tool for producing automaticsummaries. Technical Report 93TR-0004, Dept. of Computer Science, Tokyo Insti-tute of Technology, Tokyo, Japan.[Mann and Thompson, 1987] Mann, W. C. and Thompson, S. A. (1987). Rhetoricalstructure theory: A framework for the analysis of texts. Technical Report RS-87-185, USC Information Sciences Institute, Marina Del Rey, Ca.[Mauldin, 1991] Mauldin, M. L. (1991). Conceptual Information Retrieval: A CaseStudy in Adaptive Partial Parsing. Kluwer Academic Publishers, Dordrecht, TheNetherlands.[Mauldin et al., 1987] Mauldin, M. L., Carbonell, J. G., and Thomason, R. H. (1987).Beyond the keyword barrier: Knowledge-based information retrieval. InformationServices and Use, 7.[Maybury, 1990] Maybury, M. T. (1990). Planning Multisentential English Text UsingCommunicative Acts. PhD thesis, Cambridge University Computer Laboratory.[McClelland and Kawamoto, 1986] McClelland, J. L. and Kawamoto, A. H. (1986).Mechanisms of sentence processing: Assigning roles to constituents of sentences. InRumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing:Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press,Cambridge, Mass.[McClelland and Rumelhart, 1981] McClelland, J. L. and Rumelhart, D. E. (1981).An interactive activation model of context e ects in letter perception: Part 1, anaccount of basic ndings. Psychological Review, 88:375{407.[Miikkulainen, 1992] Miikkulainen, R. (1992). Discern: A distributed neural networkmodel of script processing and memory. In Drossaers, M. F. J. and Nijholt, A., edi-tors, 3rd Twente Workshop on Language Technology on Connectionism and NaturalLanguage Processing (TWLT-3), Dept. of Computer Science, University of Twente,The Netherlands.[Miyata, 1991] Miyata, Y. (1991). A User's Guide To PlaNet Version 5.6. A Tool ForConstructing, Running, And Looking Into a PDP Network.[Rumelhart et al., 1986] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).Learning Internal Representations by Error Propagation. In Rumelhart, D. E. andMcClelland, J. L., editors, Parallel Distributed Processing: Explorations in the Mi-crostructure of Cognition; Vol. 1: Foundations. The MIT Press, Cambridge, Mas-sachusetts.[Scholtes, 1990] Scholtes, J. C. (1990). Neural Networks In Natural Language Pro-cessing And Information Retrieval. PhD thesis, University of Amsterdam, TheNetherlands.[Sejnowski and Rosenberg, 1986] Sejnowski, T. J. and Rosenberg, C. (1986). Nettalk:A parallel network that learns to read aloud. Technical report, Johns HopkinsUniversity.12 [Touretzky, 1989] Touretzky, D. S. (1989). Connectionism and compositional seman-tics. Technical report, Computer Science Department, Carnegie-Mellon University.[Weiss and Kulikowski, 1991] Weiss, S. M. and Kulikowski, C. (1991). Computer Sys-tems that Learn: Classi cation and Prediction Methods from Statistics, Neural Nets,Machine Learning, and Expert Systems. Morgan Kaufmann: San Mateo.[Wermter, 1992] Wermter, S. (1992). A Hybrid And Connectionist Architecture ForA Scanning Understander. In Proceedings of the 10th European Conference onArti cial Intelligence (ECAI-92), Vienna, Austria.[Young and Hayes, 1985] Young, S. R. and Hayes, P. J. (1985). Tess: A telex classify-ing and summarisation system. In The Second IEEE Conference on AI Applications.IEEE.13

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus assisted development of a Hungarian morphological analyser and guesser

1 Introduction Computational processing of highly inflectional languages – that typically feature a huge number of possible word forms – relies upon an efficient morphological analysis. For this, a comprehensive morphological analyser is needed, which cannot be replaced by a simple lexicon lookup since such a lexicon should contain all word forms for the language and would be computationally in...

متن کامل

Speech Recognition System For Spoken Japanese Sentences

A speech recognition system for continuously spoken Japanese simple sentences is described. The acoustic analyser based on a psychological assumption for phoneme identification can represent the speech sound by a phoneme string in an expanded sense which contains acoustic features such as buzz and silence as well as ordinary phonemes. Each item of the word dictionary is written in Roman letters...

متن کامل

An Estonian Morphological Analyser and the Impact of a Corpus on Its Development

The paper describes a morphological analyser for Estonian and how using a text corpus influenced the process of creating it and the resulting program itself. The influence is not limited with the lexicon only, but is noticeable in the resulting algorithm and implementation too. When work on the analyser started, there was no computational treatment of Estonian derivatives and compounds. After s...

متن کامل

Ambiguity resolution in a reductionistic parser

The ENGTWOL morphological analyser is a 55,000 entry Koskenniemi-style morphological description of English that assigns all recognised input word forms with all possible morphological readings as a disjunctive list. Those words not recognised by the ENGTWOL analyser are analysed by a heuristic module; part-ofspeech readings are assigned on the basis of the form of the word (endings etc.). The ...

متن کامل

Integrating Classical Sindarin Morphology and Syntax using XFST and XLE

Sindarin is a constructed language invented by the author and linguist J.R.R. Tolkien. It is the language spoken by the Elves. Though the number of complete sindarin sentences is rather limited[1] enough information is available to derive syntactic rules and build a Sindarin grammar. This project aims at modelling the basics of Classical Sindarin grammar in LFG using Xerox XLE system. In additi...

متن کامل

Modularisation of Finnish Finite-State Language Description - Towards Wide Collaboration in Open Source Development of a Morphological Analyser

In this paper we present an open source implementation for Finnish morphological parser. We shortly evaluate it against contemporary criticism towards monolithic and unmaintainable finite-state language description. We use it to demonstrate way of writing finite-state language description that is used for varying set of projects, that typically need morphological analyser, such as POS tagging, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997